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ABSTRACT i 

A comput«3r simule.tion of Bayes' Theorem was conducted 
in order to determine the probability that an examinee was a master 
conditional upon his test score. The inputs were: number of mastery 
states assumed, tesx length, p-ior expectation of masters in the 
examinee population, and conditional probability of a master getting 
a randomly selected test item correct, and of getting an item 
incorrect. Classification accuracy was shown to be a function of all 
of the above parameters for any specified level of mastery (in the 
criterion-referenced sense). Specific resul+s showed that for some 
combinations of prior information and test length, no information 
froai the test could force a reversal in the decision rule, or provide 
classification accuracy within acceptable error bounds. . .hence, test 
results would be irrelevant. The vulnerability of a Bayesian model to 
changes in the prior probabilities was also demonstrated. For 
example, a 10% change in conditional probability was sufficient to 
completely reverse a classification rule across all test lengths 
studied, when the prior probability was held constant. Less drastic 
shifts occured with changes in the prior probabilities. (Author) 
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2!U££FAY£1- ^^^^ educational decision-maker often wants to know if an examinee 
has mastered a sequence of instruction at some pre-specif ied level of accept- 
ability. If the test score is above the minimal passing standard, ihe examinee 
may be classified as having mastered the instruction; if his score is below 
thr minimal standard, he would be termed a "nonmast^^r*' of the instruction. 
Because of many sources of variability, misclassif ications are likely to occur, 
as shoi>7n in the following figure. 
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The objective of the educational decision-maker is to maximize the true 
classifications (True Positives and True Negatives) and to minimize the false 
classifications (False Positives and False Negatives), The datum of interest 
is the conditional probability that a particular examinee is in a particular 
state of mastery, given his test score. The objective of the present paper 
is to examine the effect of such variables as test length, number of hypoth- 
esized mastery states, and the quality of the examinee population, on the 
probability thit an examinee is in a particular state of mastery given his test 
score. Specifically, the following two questions were addressed: (a) VJliat is 
the probability of (in) correctly classifying an examinee on the basis of his 
test score, and (b) How long must a test be, and what score is required so that 
classification decisions might bo made with some specified lower limit of 
misclassif ication? 

Thcoret i cal framewo rk : The statistical model which was used for classifying 
students into various mastery groupings, given their test scores, is based 
upon Bayes' Theorem, where; 
p(Mi|T) is the conditional probabiliLv 
of a particular student being classified 
as belonging in the ith mastery state 
given his test score; N is the test P(Mi|T)= 
length; S is the number of mastery 
states hypothesized by the decision 
maker; p(Miltj) is the conditional 
probability of a person in the ith 
mastery state getting the jth test 
Item correct; p(Mi) is the prior 

probability of the representation of the ith mastery state in the examinee 
copulation (the I of examinees who are estimated to be in the ith mastery state) 
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It is assumed that the mastery states are mutually exclus- 
ive, the test items are of equal difficulty, that the test is a test of uni- 
tary skills, and that there is independence among items. 

Meth ods and Techniquos; A computer simulation of the Baycsian model was con- 
ducted using the following data: 

(1) Test length (N) took on values of 5, 10, 20, 40 items; 

(2) Number bf hypothesized mastery states (S) varied from 2 to 3; 

(3) Prior probability of mastery for a given examine.e (P(M1)) took on values 
ol .9, .7, .5 when two mastery states wei^p assumed ; 

(4) Prior probabilities of mastery states 1, 2, and 3 took on values of .5, 
.3, and .2, respectively; and .25, .50? and .25, respectively, when three 
mastery states were hypothes i'/^ed; 

(5) Assuming two mastery states, the conditional probabilities of a naster 
getting any single item correct took on the values of .9, .8, and .7; and for 
a nonnastcT getting any single item correct, the values were .6, .5, and .4 
(indiceited by p(l|Mi) in the Figures); 

(6) Assuming throe mastery states, the conditional probabilities of a master 
(Ml), intermediate master (M2), and nonmaster (M3) getting any single item 
correct i^ere .8, .6, and ♦S, respectively, and another set consisted of .9, 
.8, and .2, respectively (p(l|Mi); 

(7) The per cent correct observed scores took on the values of 60%, 707, and 
80Z. 

Data Source : The conditional probabilities in (5) and (6) were needed in order 
to obtain the values for the p(Miltj) in the preceding formula. Along with an 
estimate of one of these conditional probabilities, it is assumed that the 
decision-maker could also supply an estimate of the prior probabilities for 
the states of mastery, the number of items on the test, and the number of 
mastery 'slates. The only thing that he w^ould observe is the per cent of the 
items that a given examinee got correct. 

Results and Conclusions : Only a small portion of the results from the 
simulation can be desoribed in the present abstract. Discussion must there- 
fore be restricted to'a case in 'which two states of mastery .>?ere assumed and 
the prior expectation of finding a master v;as equal to .9. The curvature of 
each line in Figure 1 shovrs how the probability of claiming that an examinee 
is a master given his test score changes as a function of test length, per ^ 
cent correct observed, and conditional probab i II tie.s of a master and nonmaster 
getting any single item correct. (Additional graphs would show the effect of 
varying the prior expectation of mastery on p(m|t)). In this example, the 
prior expectat jlon of finding a master in the examinee population is 907^. 
The conditional probab i 1 i t ios in A, B, C, and D show the probabilities of a 
master (Ml) and nonmaster (M2) getting a typical item correct. Test length 
is plotted on the abscissa and the probability of the examinee's being a 
master (Ml) given his observed test score (based upon % correct of the total 
test length) is plotted on the ordinate. 

The effect of the test l ength variable on classification accuracy is 
dramatic: if the p(Mfpl') had to be at least .5 for a person to be called 
a master, then scores of 70% correct on a 10 item test would lead to a, 
''mastery** classification. But a 70Z score on a 20 item tost v;ould lead to 
a "nonmastery*' classification. (^U!>* 1^^) 



The effect of Vviryinj; probabilities of a master making a correct response, 
p(corro<- t/ni ) , can be seen by comparing graphs A, B, C, and D. For any test 
leng,th or obr.ervod lest score, the probability of beinr, in the mastery state 
is j^.reater in B than in A. This shift is most obvious for the 70% correct 
curve. Note that p(Ml|T) or A for an observed score of 707 (28 out of AO 
correct) is approximately .04. However, the p(M1[t) in B for 70Z of a 40 | 
iteiu test c<irrect is .87. The main reason for this abrupt change is the 
lowered requirement for mastery, from .9 to .8. The -probab ^' 1 ity that '\9 
persons" score only 70'' on long tests is quite low, whereas for "•B persons" 
, the pr<3j)abilUy of scoring 70% is rather high. Graphs IC and ID illustrate 
further chanf'.es in. the classification probability due to only .1 step changes 
in the probabilities of uasters and nonnasters making a correct response. 

The same data from figure lA can be used to ans\;er the second question 
presented earlier: Hov; long must a test be, and vhat score is required for 
classification decisions lo be made with some spec i f ied lower 1 imi t of mis- 
clas^-if icat ion? Inspection of the curves in Figure 2 reveals that test 
length markedly influences classification accuracy. For the 40 item test, 
the region uLere p(Mi|t) is greater than .1 and less than .9 extends from 
717 to 777. This noans that the probability of mi scl assi fy ing ah examinee 
will exceed .10 only when observed scores range from 717. to 77% correct. Tu 
contrast, the region of the five item test for which p(M1[t) is greater than 
• 10 and less than .90 emends from about 267> to about 79X. Hence, there is 
a much largt*r region for \:hich the probability of misclassif ication exceeds 
.10. This procedure therefore shows wnat scores must be obtained so tliat a 
nonmastery decision could be made with at ^-cast 90% confidence; which, in 
effect, force a reve rsal in the pr ior beliefs of the decision maker. 

ifiport ance of the studv : The Bayesian approach 
liafi been taken by others in devis'ing methods for classifying 

examinees on the basis of test length and examinee qualities. iioxN^ever, the 
present version is less theoretically cumbersome, and gives a straight fon^'ard 
dc^scriptfon of how classification accuracv is sensitive to the above variables. 
A general finding Jerionst rated by, but not necessarily limited to a oayesian 
model, is that setting percentage cutoff scores as a neans for defining 
mastory must: lal:e into account the test length* Cl.iss i f icat ion accuracy is 
not^ invariani vith percent correct. A specific result peculiar only to a 
Bayesian model is that classification accuracy is also a function of the 
qualities of the examinee population, or at least the dec i sion-maker \s esti - 
P:3A£ii those qualities. The model also allows confidence limits to be set 
for a given test v;hen the examinee population qualities have been specified; 
thise confidence limits then constrain the region of accuptMbl e scores . Thus, 
if a region of misclassi f icat ion error can be tolerated by the decision maker 
for a given poinilation, the model specifies v;hat the test length must be and 
what raug.e o/scores mu^t be obtained in order to stay within the desired 
acceptable region. 
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Assume that tlioro are throe states of mas- 
tery, and unequal prior probabilities for 
these three states. The educational doci- 
sion-naker must provuie estirates for the 
prior probibilities of nastor, p(ni). For 
this example let us assure the values to be- 
P('11) = .5; p(;!2) = .3; and p{M3) - .2. He' 
fnust also provide estirrates for the condi- 
tional probability of getting any given test 
item rioht, civcn each rastery state. The 
fo11o;/in9 values \/ill be used as the condi- 
tional probability of cettina an item right 
given a ra^tery state:' p{ll*n) = .8; 
P(l|n2) .6; p{li:!3) = .5. The conditional 
probabilities of gettirg an iten wrong 
given a nastery state are: p{0|rn) = 2; 
P(0|:i2) = .4; and p(0/n3) = .5. 

First v.'e need to calculate the proba- 
bility tnat an it:en i5^ansv;ered correctly. 
For the overall populdiiion, p(tj - correct} 

=:^p{lli)p{tj ^ correctlMi) = (.5)(:8) + 

(.3)(.6) + (.2)(.5) = .68. Likewise. . 
S 

p(tj = wrong) = r p(Mi)p(tj = wronglMi ^ 

(.5)(.2) ^ (.3)(.4) + (.^(.5) =. .32. 

also need to obtain thi set of conditional 
prob3:iluies for f^e a;-'ferent rastery stat^^s 
given i:han an jndi .ic^jij >.tc'" was responded 
to either correctly or wtfongl^. The general 
equation is: ^ ' 
p{ni|tj) - p(Mi)p(tj/Mi). 

Substituting the above values yields: 
pOn tj correct) - (.5)(.S) : .68 = .583; 
p(M2|tj ^ correct) = (.3)(.6) .63 - .265; 
and p(?!3ltj = co-'^ect) = (.:)(. 5) : .68 = .147 
(;Jote that tno sun equals 1.9.) Finally, 
p(Ml/tj = wrong) = (.5)(.2) : .32 = .3125 
p M2 tj = wrong) = (.3)(.4) : .32 = .375 and 
p(M3/tj = wrong) = (.2)(.5) : .32 = .3125 
If 6 items v;ere answered correctly on a 10 
item criterion-referenced test^ the following 

^ p(Hiltj) values result: 

m = 3.9 X 10"^; M2 = 6.8 x 10^^; 
M3 = 9.6 X 10-8 

Finally, the general Bayesian formula yields 
the conditional probability for each mastery 
state given the total test score. For 
example, p(Mi|T) = 



(3.9 X -lO"^) 



(•5)^ RlJLljo^i)^^ 

. (-5)9 (.3)9 (.2)9 • 

= .272. - ' 

Similar 'calculations yield p(M2/t) = .473 
and p(M3lT) = .254. » ^ • /o 

In order to combine mastery states M2 
and M3 into a sing le mastery state (which 
could represent combinino the two deqreos of 
nonmastery, Figure 4, Graph 0), the following 
calculations are required. The values for 

n 

p(Ml) and r p(ni|tj) remain the same, .5 

and 3.9 x 10-4 respectively. The new nonmas- 
tery state {'12') occurs as a result of 
combininc the previous states 112 and 113. 
Hence. p{".2') = p(:!2) ^ p(M3) = .3 + .2 = .5, 
p('-12'|tj = correct) = p{V2|tj = correct) + 
p(H3|tj = correct) = .265 t .147 = .412, and 
p(M2'/ tj = wrong) = p(M2/tj = v/roncj) + 
p(M3|tj wrong) = .375 + .3125 = .6875. 
N 

Calculation of - p{M2'f tj) yields 
1.09 X 10-3. 

Entering these new values into the general 
Bayesian Forrula, tne follo\nng values of 
p(til'rf) and p(;;2'/T) are obtained: 

p{Mr|T) = •! 0 V in-i 



= .264^ 

?(;;2'i:) ' 



.736. 



(.5)S[ (3.9 X 15-) 
L~~ TFP 

1 .09 X 10-3 



M .09 



i9 X 10-3T) 



(.5) 



^(• (3.9 X 10-- '] , d. 09 X 10-3)1 
I (.5)9 (.5)9 J 
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